# Euclidean Distance & Centroids

## Euclid's Distance

**Geogebra.** Visualize vectors and distance between them, and contrast with the angle, in the context of _Hannah Montana_ and _The IT Crowded_.

In [6]:
import numpy as np
from numpy.linalg import norm

In [7]:
# input. vector v
#        vector u
# output. Euclid's distance between v and u
def euclidDistance(v, u):
  return norm(v - u)

In [8]:
# triangle whose sides are 4 and 3, then hypotenuse is 5
euclidDistance( np.array([4, 0]), np.array([0, 3]) )

5.0

## Centroids Representing a Segment

In [9]:
# user x TV-shows watching record matrix
M = np.array([
    [30, 35, 0, 0],
    [5, 0, 0, 30],
    [0, 0, 32, 20],
    [0, 0, 34, 5]
])

Nearest centroid for a given user

In [10]:
# action adventure
cen_0 = np.array([28, 28, 0, 0])
# family comedy
cen_1 = np.array([0, 0, 30, 2])

centroids = [ cen_0, cen_1 ]

In [11]:
def nearestCentroid(user, centroids):
  return min( centroids, key = lambda i: euclidDistance(i, user) )

In [12]:
# Antar's centroid, Action-adventure
nearestCentroid( M[0] , centroids )

array([28, 28,  0,  0])

In [13]:
# Tameem's centroid, Family-comedy
nearestCentroid( M[1] , centroids )

array([ 0,  0, 30,  2])

In [14]:
# Logain's centroid, Family-comedy
nearestCentroid( M[2] , centroids )

array([ 0,  0, 30,  2])

In [15]:
# Jumana's centroid, Family-comedy
nearestCentroid( M[3] , centroids )

array([ 0,  0, 30,  2])

## Representation Measure as Euclid's Distance Sums

In [16]:
# input. Matrix M of users
#        List centroids
# output. Total sum of Euclid's distances among all users to their centroids
def centroidsWeight(M, centroids):
  # list of distances of each user to its centroid
  distancesList = [ euclidDistance(
                      # user
                      i,
                      # user's centroid
                      nearestCentroid(i, centroids)
                    # loop on each row
                  ) for i in M
              ]

  # sum all distances
  return np.sum(distancesList)

In [17]:
centroidsWeight(M, centroids)

71.73093338274413

## Task

In [18]:
# generate random numbers
from random import random
# 0.0 ~ 1.0
#random()
# 0.0 ~ 100.0
100 * random()

72.21287712503687

- Read section 7.3 from Falk.
- Design a recommendation engine which utilizes centroids.
- Use the above code to generate random positions of centroids. For each compute `centroidsWeight`. Automate the procedure by iterating on 100 random positions, selecting centroids of minimum weight.
- Given a segment of users which you believe are similar in their taste. How can we construct a centroid for them using the weighted average method?